Data Mining with Cubegrades
نویسنده
چکیده
Much interest has been expressed in database mining by using association rules (Agrawal, Imielinski, & Swami, 1993). In this article, I provide a different view of the association rules, which are referred to as cubegrades (Imielinski, Khachiyan, & Abdulghani, 2002) . An example of a typical association rule states that, say, in 23% of supermarket transactions (so-called market basket data) customers who buy bread and butter also buy cereal (that percentage is called confidence) and that in 10% of all transactions, customers buy bread and butter (this percentage is called support). Bread and butter represent the body of the rule, and cereal constitutes the consequent of the rule. This statement is typically represented as a probabilistic rule. But association rules can also be viewed as statements about how the cell representing the body of the rule is affected by specializing it with the addition of an extra constraint expressed by the rule’s consequent. Indeed, the confidence of an association rule can be viewed as the ratio of the support drop, when the cell corresponding to the body of a rule (in this case, the cell of transactions including bread and butter) is augmented with its consequent (in this case, cereal). This interpretation gives association rules a dynamic flavor reflected in a hypothetical change of support affected by specializing the body cell to a cell whose description is a union of body and consequent descriptors. For example, the earlier association rule can be interpreted as saying that the count of transactions including bread and butter drops to 23% of the original when restricted (rolled down) to the transactions including bread, butter, and cereal. In other words, this rule states how the count of transactions supporting buyers of bread and butter is affected by buying cereal as well. With such interpretation in mind, a much more general view of association rules can be taken, when support (count) can be replaced by an arbitrary measure or aggregate, and the specialization operation can be substituted with a different “delta” operation. Cubegrades capture this generalization. Conceptually, this is very similar to the notion of gradients used in calculus. By definition, the gradient of a function between the domain points x1 and x2 measures the ratio of the delta change in the function value over the delta change between the points. For a given point x and function f(), it can be interpreted as a statement of how a change in the value of x (∆x) affects a change in value in the function (∆ f(x)). From another viewpoint, cubegrades can also be considered as defining a primitive for cubes. An n-dimensional cube is a group of k-dimensional (k<=n) cuboids arranged by the dimensions of the data. A cell represents an association of a measure m (e.g., total sales) with a member of every dimension. The scope of interest in Online Analytical Processing (OLAP) is to evaluate one or more measure values of the cells in the cube. Cubegrades allow a broader, more dynamic view. In addition to evaluating the measure values in a cell, they evaluate how the measure values change or are affected in response to a change in the dimensions of a cell. Traditionally, OLAP has had operators such as drill downs, rollups defined, but the cubegrade operator differs from them as it returns a value measuring the effect of the operation. Additional operators have been proposed to evaluate/measure cell interestingness (Sarawagi, 2000; Sarawagi, Agrawal, & Megiddo, 1998). For example, Sarawagi et al. computes anticipated value for a cell by using the neighborhood values, and a cell is considered an exception if its value is significantly different from its anticipated value. The difference is that cubegrades perform a direct cell-to-cell comparison.
منابع مشابه
Cubegrades – Generalization of Association Rules to Mine Large Datasets by Amin Arshad Abdulghani
OF THE DISSERTATION CUBEGRADES – GENERALIZATION OF ASSOCIATION RULES TO MINE LARGE DATASETS by AMIN ARSHAD ABDULGHANI Dissertation Director: Tomasz Imielinski Cubegrades are generalization of association rules which represent how a set of measures (aggregates) is affected by modifying a cube through specialization (rolldown), generalization (rollup) and mutation (which is a change in one of the...
متن کاملEfficient Data Mining with Evolutionary Algorithms for Cloud Computing Application
With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...
متن کاملUsing Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach
Heart disease is one of the major causes of morbidity in the world. Currently, large proportions of healthcare data are not processed properly, thus, failing to be effectively used for decision making purposes. The risk of heart disease may be predicted via investigation of heart disease risk factors coupled with data mining knowledge. This paper presents a model developed using combined descri...
متن کاملPredicting Bankruptcy of Companies using Data Mining Models and Comparing the Results with Z Altman Model
One of the issues helping make investment decisions is appropriate tools and models to evaluate financial situation 0f the organization. By means of these tools, investors can analyze financial situation of the organization and identify financial distress or an ideal condition, they become aware of making decisions to invest in appropriate conditions. The main objective of this study is to ev...
متن کاملAutomated detection of coronavirus disease (COVID-19) by using data-mining techniques: a brief report
Background: The clinical field has vast sick data that has not been analyzed. Discovering a way to analyze this raw data and turn it into an information treasure can save many lives. Using data mining methods is an efficient way to analyze this large amount of raw data. It can predict the future with accurate knowledge of the past, providing new insights into disease diagnosis and prevention. S...
متن کامل